-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Velocity SIMD CPU Runtime (Runtime + Scalar x2) #1055
Merged
+8,733
−50
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
m4rs-mt
force-pushed
the
velocity2
branch
9 times, most recently
from
August 30, 2023 21:23
2cbb2cd
to
93652f7
Compare
m4rs-mt
force-pushed
the
velocity2
branch
2 times, most recently
from
August 30, 2023 23:49
9b35027
to
f01fe60
Compare
m4rs-mt
changed the title
Velocity SIMD CPU Runtime (Scalar)
Velocity SIMD CPU Runtime (Runtime + Software SIMDx2)
Aug 30, 2023
m4rs-mt
changed the title
Velocity SIMD CPU Runtime (Runtime + Software SIMDx2)
Velocity SIMD CPU Runtime (Runtime + Scalar x2)
Aug 30, 2023
m4rs-mt
force-pushed
the
velocity2
branch
10 times, most recently
from
September 6, 2023 06:58
3fe39a5
to
86a9007
Compare
…y generated parameter instances to Velocity kernels.
…d transfer scalar arguments into the vector world.
…ering of blocks in the schedule.
…the scalar Velocity code generator.
…velocity accelerators.
m4rs-mt
force-pushed
the
velocity2
branch
2 times, most recently
from
September 26, 2023 13:47
42fed6f
to
716d7c3
Compare
@MoFtZ I addressed an issue in the |
MoFtZ
approved these changes
Sep 27, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds the announced SIMD-based CPU runtime fully implemented in managed code. The new
Velocity
accelerator supports most GPU kernels (except those using dynamic shared memory) and is able to utilize SIMD hardware acceleration on modern CPUs, allowing you to run your ILGPU kernels efficiently on then CPU by leveraging the implemented automatic vectorization engine.It supports the following hardware configurations after merging ALL velocity PRs.
128bit
-basedX64 SSE
andARM64 Neon
instructions (also supportsM1
Macs - Mac M Series Support #769, in progress)256bit
-basedX64 AVX
instructions (in progress)512bit
-basedX64 AVX2
instructions (limited feature set; some functions will fallback to256bit
registers)Please note that this is the initial PR adding support for building and managing
Velocity
devices. Furthermore, it also contains the fully-featured code generator to create SIMD-based instructions out of ILGPU IR nodes. However, it does not contain any SIMD code-generation plugins for the backend code generator that will be added later on.This PR integrates CI-support contributed by @MoFtZ in #1096.
Note that this PR is a new version of PR #891.
This PR depends on #1059, #1061, #1062, #1063, #1064, #1065, #1066, #1067, #1068, #1069, #1070, #1071, #1072, #1073, #1074, #1079, and #1081Co-authored-by: MoFtZ [email protected]